Markov Decision Processes with General Discount Functions

نویسندگان

  • YAIR CARMON
  • Adam Shwartz
چکیده

In Markov Decision Processes, the discount function determines how much the reward for each point in time adds to the value of the process, and thus deeply a ects the optimal policy. Two cases of discount functions are well known and analyzed. The rst is no discounting at all, which correspond to the totaland average-reward criteria. The second case is a constant discount rate, which leads to a decreasing exponential discount function. However, other discount functions appear in many models, including those of human decisionmaking and learning, making it interesting and possibly useful to investigate other functions. We review results for a weighted sum of several discount functions with di erent cost functions, showing that nite models with this criterion have optimal policies which are stationary from a xed time N, aptly called Nstationary. We review a proof for their existence and an algorithm for their computation, as well as remark on the structure of these policies as the discount factors vary. We then discuss two attempts to generalize the results for weighted exponential discount functions. The rst is a hypothesis for a sum of di erent general discount function with certain exponential bounds, in the spirit of the results for the exponential case. We show via counterexample that despite the intuitive appeal of the hypothesis, it is in fact not true, and make some remarks on why this is so. Our second attempt at generalization is to represent a general discount function as an in nite sum of decreasing exponential functions with constant coe cients. We give convergence conditions on the sum under which the previously established results can be extended to enable us to nd an optimal policy for it. We discuss two examples that clarify our results, and connect them to areas which require non-exponential discount functions. The work is concluded by an example of a model with a monotonic discount function that has no optimal N-stationary policy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov decision processes with exponentially representable discounting

We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable” discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.

متن کامل

Eventually-stationary policies for Markov decision models with non-constant discounting

We investigate the existance of simple policies in finite discounted cost Markov Decision Processes, when the discount factor is not constant. We introduce a class called “exponentially representable” discount functions. Within this class we prove existence of optimal policies which are eventually stationary—from some time N onward, and provide an algorithm for their computation. Outside this c...

متن کامل

Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

This paper presents sufficient conditions for the existence of stationary optimal policies for averagecost Markov Decision Processes with Borel state and action sets and with weakly continuous transition probabilities. The one-step cost functions may be unbounded, and action sets may be noncompact. The main contributions of this paper are: (i) general sufficient conditions for the existence of ...

متن کامل

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs...

متن کامل

Sensitive Discount Optimality via Nested Linear Programs for Ergodic Markov Decision Processes

In this paper we discuss the sensitive discount opti-mality for Markov decision processes. The n-discount optimality is a reened selective criterion, that is a generalization of the average optimality and the bias optimality. Our approach is based on the system of nested linear programs. In the last section we provide an algorithm for the computation of the Blackwell optimal policy. The n-disco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007